A Novel Approach for NearDuplicate Detection of Web Pages using TDW Matrix
نویسندگان
چکیده
منابع مشابه
Analyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملthe use of appropriate madm model for ranking the vendors of mci equipments using fuzzy approach
abstract nowadays, the science of decision making has been paid to more attention due to the complexity of the problems of suppliers selection. as known, one of the efficient tools in economic and human resources development is the extension of communication networks in developing countries. so, the proper selection of suppliers of tc equipments is of concern very much. in this study, a ...
15 صفحه اولA Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification
In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...
متن کاملFormation interface detection using Gamma Ray log: A novel approach
There are two methods for identifying formation interface in oil wells: core analysis, which is a precise approach but costly and time consuming, and well logs analysis, which petrophysists perform, which is subjective and not completely reliable. In this paper, a novel coupled method was proposed to detect the formation interfaces using GR logs. Second approximation level (a2) of GR log gained...
متن کاملA Novel Crawling Algorithm for Web Pages
Crawler is a main component of search engines. In search engines, crawler part is responsible for discovering and downloading web pages. No search engine can cover whole of the web, thus it has to focus on the most valuable web pages. Several Crawling algorithms like PageRank, OPIC and FICA have been proposed, but they have low throughput. To overcome the problem, we propose a new crawling algo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2011
ISSN: 0975-8887
DOI: 10.5120/2374-3128